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Accountability is a topic on everyone's 
mind. In just about every state, schools are 
being held accountable for student perfor- 
mance under systems put into effect over the 
past 5-10 years. As states are providing reme- 
dies and enacting sanctions for low perfor- 
mance, policymakers are realizing the daunt- 
ing implications of the task in front of them. 
In over half the states, students will have to 
pass a state test to graduate from high school; 
concerns about large numbers of failures, 
particularly for minority students, are 
mounting. The recent reauthorization of the 
Elementary and Secondary Education Act, 
the No Child Left Behind Act (NCLB) of 2001, 
sets new requirements for state accountabili- 
ty systems as a condition of federal aid for 
disadvantaged children. As a result, states 
are actively reexamining their accountability 
policies. 

To assist in the redesign of accountability 
systems, the Consortium for Policy Research 
in Education (CPRE) and the Center for 
Research on Evaluation, Student Standards, 
and Testing (CRESST) sought to assemble 
knowledge from new research on emerging 
accountability systems. A book. Redesigning 
Accountability Systems for Education, edited by 
Susan H. Euhrman and Richard E. Elmore 
(Teachers College Press, in press), contains 
chapters by leading accountability 
researchers. This issue of CPRE Policy Briefs 
summarizes the book by focusing on four 
questions the authors of the book address: 

1) How valid are new accountability sys- 
tems? 

2) How fair are new accountability sys- 
tems? 



3) What are the effects of new account- 
ability systems? 

4) What is necessary to improve the func- 
tioning of accountability systems? 

This Policy Brief reviews the many issues 
that states are confronting as they implement 
accountability systems, and provides guid- 
ance for states looking to fine-tune or 
redesign accountability systems to help meet 
policies as they were intended. Specifically, 
this Brief offers recommendations for 
improving accountability systems by enhanc- 
ing the use of expert technical advice, by 
improving the collection and interpretation 
of system data, and by investing in capacity 
building to ensure that both students and 
educators have the necessary means to effec- 
tively respond to accountability systems. 

Background 

The accountability systems written about 
in the book are those established over the 
past 5-10 years, mostly at the state level, 
although a number of districts have similar 
systems. NCLB accountability provisions 
also reflect the same principles. 

These systems are distinguished by their 
attention to school-level performance and by 
their inclusion of consequences for that per- 
formance. They are quite different from earli- 
er approaches to accountability that primari- 
ly focused on district compliance with state 
regulations. The new systems grow out of a 
climate that draws strong parallels between 
education and business; they intend to focus 
schools on the bottom line. They also reflect 
an attempt, strong in rhetoric if not reality, by 
states to back off from detailed regulations 
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about the process of education. The new sys- 
tems reflect an explicit theory of action about 
improving student achievement that stresses 
the motivation of teachers, students, and 
administrators. 

The new systems assume that, when they 
are operating as intended: 

• Performance, or student achievement, is the 
key value or goal of schooling and that con- 
structing accountability around performance 
focuses attention on it. Since the indices that 
are used to measure school status and 
progress are composed primarily of 
achievement measures, the systems are 
intended to maximize focus on those mea- 
sures. 

• Performance can be accurately and authenti- 
cally measured by the assessment instruments 
in use. Assessments are aligned to student 
standards and gauge achievement of those 
standards in reliable and valid ways. If 
accountability is to hinge on performance, 
then the key measures used by the 
accountability system must correctly 
assess performance. Further, the new sys- 
tems generally assume that school perfor- 
mance can be fairly assessed through test- 
ing; only in a few states do accountability 
systems include provisions for visiting and 
reviewing schools by observing teaching 
and learning. 

• Consequences, or stakes, motivate school per- 
sonnel and students. Not only do those sub- 
ject to stakes focus more on performance, 
they try harder because both positive 
inducements (such as bonuses) and nega- 
tive sanctions (such as school takeover, 
reconstitution, or denial of promotion or 
graduation) are meaningful and real. 

• Improved instruction and higher levels of per- 
formance will result. Teachers trying harder 
to teach and students trying harder to 
learn will cormect to mean better interac- 
tion around content. The assessments will 
also help promote good instruction by pro- 
viding feedback on student performance. 
Following this assumption, motivation is a 
key to improving instruction. If teachers 
don't have the capacity necessary to 
respond to the accountability system 
incentives, it is assumed that the incentives 
are strong enough to motivate them and 
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admirristrators to find it somehow, by 
seeking additional professional develop- 
ment, for example. Also, attaching conse- 
quences at the school level assumes that 
schools will collectively be able to fashion 
a response and, therefore, that they have, 
or will be motivated to form, some sort of 
internal coherence. Many accountability 
systems are accompanied by policies to 
build capacity and most have some assis- 
tance strategies embedded in the conse- 
quences for failing schools, but account- 
ability policies focus primarily on altering 
the incentive structure as a means to 
improving instruction and performance. 
This primary reliance on incentives to 
motivate teachers and schools to do 
something the schools have never done 
before — to succeed with essentially all 
students — suggests that these systems 
make an important additional assumption 
or set of assumptions (i.e., that teachers 
already know how to succeed with all stu- 
dents but choose not to, or don't expect to, 
with some, or that at least somebody 
knows how to succeed so that, if motivat- 
ed, others can learn how to do it too). 

• Unfortunate unintended consequences are 
minimal. If the systems work as intended, 
the goal of higher performance will not be 
undermined by perverse incentives or 
other negative developments. For exam- 
ple, instruction will improve, not become 
narrowly focused around test-taking skills, 
higher hurdles for high school graduation 
will not increase dropping out, and hold- 
ing schools accountable will not cause 
exclusion of special-needs students from 
testing or retention of students in non-test- 
ed grades. 

Are the new systems working as intend- 
ed? Are the assumptions borne out? We can 
address these questions by asking how valid 
and fair the systems are; by asking about 
their effects on motivation, instruction, and 
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performance; and by asking what would be 
needed to improve system function. 

How Valid are New 
Accountability Systems? 

This question asks whether the new 
accountability systems are accurately focus- 
ing on student learning as their rhetoric sug- 
gests. In other words, are assessments sensi- 
tive to instruction — are they correctly mea- 
suring student learning — and are they put to 
appropriate uses, given the information they 
provide? 

With respect to assessment, a first ques- 
tion has to do with the adequacy and appro- 
priateness of test content and of the cognitive 
demands of the test (Baker & Linn, in press). 
Commonly, this issue is spoken of as "align- 
ment." Are tests measuring achievement of 
the knowledge and skills states expect of stu- 
dents? Do tests adequately cover the content 
in state standards in terms of both topic cov- 
erage and level of rigor? The evidence is not 
encouraging on this score. Achieve, an inde- 
pendent, bipartisan, nonprofit organization, 
founded by a group of governors and chief 
executive officers in 1996, has worked with 
over 20 states examining alignment. It found 
that state tests do cover standards (nearly all 
test items measure content in standards), but 
coverage is often superficial with tests mea- 
suring the least complex of the skills called 
for. Tests tend to be unbalanced, measuring 
some standards but not others. For example, 
some state high school math tests tend to 
focus mostly on numbers and measurement, 
with little emphasis on algebra and geometry. 
High school tests are particularly problemat- 
ic, posing a relatively low level of challenge, 
which is in sharp contrast to fourth- and 
eighth-grade tests in some of the same states 
(Rothman, in press). Achieve only reviewed 
one commercially available test; at grade 10, 
one-quarter of the test questions did not 
match the state standards at all. This is in 
contrast to the findings about significant 
matches between standards and tests in 
states that design their own tests and raises 
concerns about the likely increased use of less 
expensive commercial tests in response to 
NCLB's increased testing requirements. 
States that use commercial tests will need to 
ensure that test publishers augment the 
examinations in ways that align with the 
state's standards. 



Accountability systems establish levels of 
performance, or cut scores, in order to define 
a certain level of achievement as "proficient" 
or "basic." States use different approaches 
and methods to set cut scores and, some- 
times, the methods are not well elaborated. 
The fluidity of these definitions is illustrated 
by the action of some states in the wake of 
NCLB's requirement to bring all students to 
"proficient" within 12 years. As of this writ- 
ing, at least three states have changed their 
definition of "proficient," using for federal 
purposes a level previously called "basic" or 
"partially proficient." Further, measurement 
error is associated with any test score, and 
both individual students and schools can be 
misclassified as either "proficient" or "not 
proficient" simply because of random error. 
Research has shown that the probability of 
misclassification is substantial. When 
accountability systems require disaggregated 
reporting of scores, measurement and sam- 
pling error is greater for schools with large 
numbers of subgroups. In addition, the dan- 
ger of misclassification is greater for small 
schools. Error rates increase as the number of 
students decrease. It is not uncommon for the 
best and worst performers to include large 
numbers of small schools, probably placed 
into these categories by error. It is not clear 
that policymakers know the probability of 
misclassification or that such information is 
provided to all users (Baker & Linn, in press). 

Another aspect of validity has to do with 
whether scores represent learning or other 
factors. A "status" score or achievement level 
reflects the student or students' background 
as much as it does any learning that took 
place in the year of testing. For that reason, 
many states include gain scores or improve- 
ment ratings in their accountability mea- 
sures. Some use both the percentage of stu- 
dents at a given achievement level and a gain 
score to rate schools. States use different 
models for judging improvement. For exam- 
ple, they can look at changes in the perfor- 
mance of successive years of students for the 
same grade or they can look at changes in 
performance from one grade to the next for 
students who were tested in both years. The 
problem with the first model is that different 
cohorts of students can have very different 
characteristics. In addition, in areas with a lot 
of mobility, the turnover of students in a 
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school could be responsible for performance 
changes, not learning. The second model is 
very appealing because it holds schools 
accountable for the value they add; it controls 
for student background by controlling for 
student achievement. Even though this 
approach may not completely eliminate the 
effect of background factors, such as student 
access to help at home during the school year, 
it comes closer than the other approach (Linn, 
in press). 

Validity also concerns whether measures 
are put to appropriate use. Many testing 
experts would fault today's accountability 
systems on several grounds. Consequences 
are often applied on the basis of a single mea- 
sure of mastery, rather than using multiple 
measures that tap into different ways of 
demonstrating competence in a content 
domain. Policymakers often say they are 
using multiple measures when they provide 
multiple opportunities to take the same test, 
but that is not the same as having multiple 
assessments. Also, the chances of misclassifi- 
cation raise doubts about the application of 
harsh consequences based on a single test 
administration. In addition to giving stu- 
dents multiple opportunities to take tests that 
count for graduation or promotion, some 
states are averaging scores for schools over a 
period of years. 

Furthermore, the use of a test score to 
impose a reward or a sanction presumes that 
the individuals whose actions produce those 
scores — teachers and students — have the 
wherewithal and know-how to do their job. 
For a score to be valid, teachers have to 
understand and act on the import of the 
accountability information. Because school 
accountability systems focus almost exclu- 
sively on outcomes, they produce little in the 
way of reliable information about classroom 
practice. Nor do typical schools have other 
mechanisms for collecting and sharing infor- 
mation about instruction across classrooms. 
Hence, school personnel rarely have enough 
data to figure out what factors produce given 
outcomes and design a remedy; they lack suf- 
ficient data for attribution (O'Day, in press). 
And importantly, for accountability systems 
to be valid, teachers must have the capacity 
to teach students the knowledge and skills to 
be assessed. A major issue cited by authors in 



the book is lack of instructional capacity — of 
materials and teacher knowledge and skill — 
and therefore of opportunity to learn. If 
accountability theory suggests that by pro- 
viding strong incentives, teachers will muster 
abilities they lacked before the incentives 
were imposed, then the theory must be fault- 
ed. Capacity does not magically appear, as 
will be seen when effects are examined. 

How Fair are New 
Accountability Systems? 

Fairness has a lot to do with validity and 
is hard to single out. It is not a valid use of a 
test to allocate consequences for teachers 
based on the scores of their students if varia- 
tions in those scores are heavily influenced 
by factors other than quality of instruction or 
instructional effort. So, if low student socioe- 
conomic status depresses scores in a school, 
the average score is neither a fair nor valid 
measure of that school's instructional efforts. 
And with respect to students, applying con- 
sequences according to their test scores, if 
they have lacked the opportunity to learn the 
material, raises both validity and fairness 
questions. But in this section we take up 
some aspects of fairness not discussed above: 
the inclusion of students with disabilities and 
low levels of English proficiency, the dispari- 
ties in achievement among subgroups, and 
the uneven application of consequences. 

States are required by federal law to 
include students with disabilities and limited 
English proficient (LEP) students in their 
assessments and accountability systems, and 
NCLB is significantly more directive about 
inclusion than past policies. However, in 
recent years, some 36 states were cited by the 
federal government for problems with 
including students with disabilities, and 33 
were cited for problems with including LEP 
students (Thurlow, in press). Sometimes 
states provide adequate accommodations for 
students (such as longer time to take a test), 
but do not include scores for all such accom- 
modated students in their accountability sys- 
tems. Exclusion of scores is even more preva- 
lent for students who use alternate assess- 
ments or "nonstandard" accommodations, 
accommodations the student's Individual- 
ized Education Program team deems impor- 
tant even though they are not on the state list. 
As Thurlow (in press) puts it, "Unless states 
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figure out a way to include students with 
non-approved accommodations in account- 
ability systems, they may create incentives to 
designate students with non-approved 
accommodations solely as a way of excluding 
them." 

Even though federal law regulates inclu- 
sion in accountability systems related to Title 
1, it is silent with regard to inclusion of dis- 
abled or LEP students in state graduation 
and promotion testing. Policymakers may 
give such students alternative diplomas, but 
aside from the well-established GED (Gener- 
al Educational Development Diploma), the 
value of such alternatives in terms of a stu- 
dent's future educational or employment 
opportunities is uncertain (Heubert, in 
press). Disabled, LEP, and minority students 
fail state graduation tests at much higher 
rates than other students. Even in Texas, 
which according to some studies has made 
significant progress in reducing the achieve- 
ment gap on its high school test between 
Whites and other students, the failure rates 
for Hispanics and African American students 
was more than double that of Whites as of 
1998 (Garnoy, Loeb, & Smith, 2001; Natriello 
& Pallas, 2001 as cited in Heubert, in press). 
In states with more rigorous high school 
assessments, both the disparities and the fail- 
ure rates, with some as much as 60-90%, are 
higher. Mounting evidence about the greater 
prevalence of underprepared and misas- 
signed teachers in high-poverty schools 
(Ingersoll & Jerald, 2002) suggests that 
opportunity to learn is not evenly distrib- 
uted. At the same time, it is becoming harder 
for students to bring legal action regarding 
these tests. In 2001, the Supreme Gourt decid- 
ed that private individuals could no longer 
bring "disparate impact" cases under Title VI 
of the 1964 Givil Rights Law, which had been 
a powerful tool in the past (Heubert, in 
press). 

While some students seem more at risk 
from new accountability systems than others, 
it is worth noting that students generally face 
more consequences than adults under new 
state accountability systems. Stakes are seri- 
ously imbalanced, applying more harshly to 
students than to schools and the adults that 
work in them. The adults are somewhat shel- 
tered by the fact that a school is a collective of 



individuals; consequences are diffused 
throughout the organization rather than 
falling on specific individuals, but students 
bear the brunt of consequences as individuals 
(Elmore, in press; O'Day, in press; Siskin, in 
press). Stakes fall unambiguously on stu- 
dents, who, unlike the adults who are sup- 
posed to be providing them with the oppor- 
tunity to learn, do not have the means to 
defend themselves politically. If they are rep- 
resented at all in debates about accountabili- 
ty, it is by adults who have their own interests 
to protect (Elmore, in press). In addition, 
states seem to be moving ahead in the appli- 
cation of stakes on low-performing students 
— withholding promotion or graduation — 
while dramatically withdrawing from apply- 
ing the consequences their policies require 
for low school performance. Because they 
lack the capacity to conduct in-depth reviews 
and to provide assistance, states are typically 
targeting for action many fewer schools than 
are eligible for remedies on performance cri- 
teria (Fuhrman, Goertz, & Duffy, in press). 

Given the mixed record of including all 
children in accountability policies, the dis- 
parate impact of these systems on different 
groups of students, and the uneven applica- 
tion of stakes, it would be hard to argue that 
new accountability systems are currently fair, 
although improvement on these factors may 
come over time. This is certainly what policy- 
makers are promising. The more overarching 
issue of fairness with respect to the new poli- 
cies is one we have mentioned before: Do stu- 
dents have the opportunity to learn the mate- 
rial on which they are being assessed. As 
Elmore (in press) points out, "In a society 
where educational attainment is heavily 
related to future income, retention in grade, 
denial of diplomas, and dropping out have 
consequences that are extremely serious for 
students." It is unethical to punish students 
for not learning content they have not been 
taught. What do we know about opportunity 
to learn? For that, we turn to evidence about 
the effects of new accountability policies. 

What are the Effects of New 
Accountability Policies? 

A central point is that the effects of 
accountability policies vary. New account- 
ability systems certainly get the attention of 
teachers and other school persormel. Teach- 



5 




Policy Briefs 



ers and principals report significant effects of 
assessments on curriculum and instruction 
and studies have shown that they allocate 
their time according to the centrality of sub- 
jects in the testing system. In other words, 
assessment policies are motivating and lead 
to modifications in practice (Herman, in 
press). But how teachers actually respond to 
the signal, how they modify curriculum and 
instruction, differs quite a bit from school to 
school and even from student to teacher. 

A number of studies find that curriculum 
and instruction become narrowed as a result 
of increased focus on state assessments. Non- 
tested subjects are given short shrift and 
teachers use the test format as a model for 
instruction, so when multiple-choice items 
dominate in the assessment, they are includ- 
ed in teacher worksheets as well. In addition, 
teachers report spending significant amounts 
of time in test preparation, more so in schools 
serving high-poverty children (Firestone, 
Camilli, Yurecko, Monfils, & Mayrowetz, 
2000; and Herman & Golan, 1993 as cited in 
Herman, in press). On the other hand, atten- 
tion to the assessment can mean adding to 
the curriculum, depending on the nature of 
the assessment. Researchers found teachers 
including more problem-solving tasks and 
writing in states like Maryland and Kentucky 
(Firestone, Mayrowetz, & Fairman, 1998; 
Stecher & Barron, 1999; and Stecher, Barron, 
Kaganoff, & Goodwin, 1998 as cited in Her- 
man, in press). And some teachers in all 
states studied rose to the challenge and tried 
to enhance instruction to meet new stan- 
dards. 

What accounts for this variation? Some of 
the difference has to do with the nature of the 
assessments; more sophisticated assessments 
with open-ended items are modeled in prac- 
tice just as less sophisticated tests are associ- 
ated with worksheets and drill-type activities 
(Herman, in press). But, in large measure, the 
variation reflects differences in capacity 
among schools; in the knowledge, skill levels, 
and belief systems of teachers; in the ability 
of the school to fashion a collective response 
to external accountability; and in the effec- 
tiveness of leadership. Accountability sys- 
tems do not by themselves appear to mobi- 
lize new capacity; schools' responses to them 
depend heavily on the capacity they already 



have. As Elmore (in press) puts it, "The best 
predictor of how a school will respond to the 
introduction of stakes at Time 1 is its organi- 
zational culture and capacity at Time 0. . ." As 
pressure increases, low-capacity schools may 
add academic content and remediation, but 
without deliberate capacity building, they are 
unlikely to make large improvements in their 
core instructional capacity. Schools with 
more attention to and capacity for academic 
success often respond to accountability pres- 
sure in ways that increase their academic 
focus and coherence (Elmore, in press). 

Even in high schools, where generally 
standards reforms seem to have had the least 
effect to date and achievement gains have yet 
to be seen, varied responses to accountability 
systems are seen. High schools are being 
asked to do what they have never done 
before — bring all children to common high 
standards, instead of differentiating academ- 
ic content. They have difficulty focusing in on 
a few academic subjects — the ones most 
likely to be tested — since this threatens the 
importance of faculty in many other depart- 
ments. As we have seen, students, many of 
whom come to high school far behind in their 
academic progress, are increasingly the tar- 
gets of accountability pressures as individu- 
als. Yet, some high schools respond more 
constructively than others. Schools that are 
more academically focused to begin with face 
the challenge of providing their academic 
programs to all, not just most students. But 
that is less daunting than the situation of 
schools without serious academic focus; they 
must now invent it. To some teachers in such 
schools, who often find less than half of their 
students graduating, the challenge of prepar- 
ing students most at risk seems impossible 
(Siskin, in press). When such very low-capac- 
ity schools respond to accountability systems 
by focusing more on performance, they may 
be on a long-term improvement trajectory, 
but they may also be complying in a pro 
forma way, without much deep capacity 
building (DeBray Parson, & Avila, 2003). 

Despite the fact that capacity-related vari- 
ation is the predominant finding in studies of 
classroom effects of accountability policies, 
there are some overall trends in achievement 
data. Looking at eighth-grade mathematics 
scores on the National Assessment of Educa- 
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tional Progress (NAEP) from 1996-2000, 
Carnoy and Loeb (in press) found significant- 
ly larger gains in states with strong account- 
ability systems (those with significant conse- 
quences for schools and students), across all 
racial and ethnic groups and particularly for 
African Americans. While fourth-grade 
results were not as strong, the relationship 
between forceful accountability and African 
American performance was noted there as 
well. Although there has been some national 
debate about accountability tests leading to 
increased retention in ninth grade, and stu- 
dents dropping out, Carnoy and Loeb did not 
find such a relationship.' However, they also 
did not note any positive effect of account- 
ability systems on student attainment. The 
new systems are not holding students in high 
school at greater rates, nor are they leading to 
greater rates of college-going. Since postsec- 
ondary behavior, particularly college atten- 
dance, has substantial long-term effects on 
income and life opportunities, we can only 
hope that score increases in lower grades 
eventually presage not only better student 
performance but also better student attain- 
ment at higher grades. It is important to note 
that there is considerable variation from state 
to state in NAEP scores, even among states 
with strong accountability systems. Strong 
accountability systems send a signal, but as 
shown, they do not in themselves provide the 
capacity necessary for students and schools 
to respond constructively. 

What is Necessary to Improve 
Accountability Systems? 

Accountability systems need not be set in 
stone. They can be refined and improved 
over time. As of this writing, states have 
planned changes to their systems in response 
to NCLB that they will now have to design in 
detail and implement. This process could 
provide opportunities for improvement. 

Many observers have recommended sig- 
nificant changes in existing accountability 
systems, such as increasing the use of multi- 
ple measures or assuring that adults bear 
consequences before students. CPRE and 

' In an earlier study of Texas, Carnoy, Loeb, and Smith 
(2001) found that an increase in ninth-grade retention in 
Texas pre-dated the TAAS (Texas Assessment of Academ- 
ic Skills) system, though it may have stemmed from ear- 
lier increases in high school graduation requirements. 



CRESST have developed standards for 
accountability systems (see sidebar on 
page 8) to help policymakers develop more 
valid, fair, and effective systems (Baker, Linn, 
Herman, Koretz, & Elmore, 2001). Redesign- 
ing Accountability Systems for Education 
includes many other specific recommenda- 
tions about improving accountability sys- 
tems (Elmore, in press; Herman, in press; 
Heubert, in press; O'Day, in press). Several 
themes run through those recommendations. 

Eirst, technical information about assessment 
and accountability systems must be brought to 
bear when policymakers deliberate accountability 
systems. Policymakers need to know the error 
terms of assessments, for example, so they 
can determine the chances of misclassifying 
students or schools based on a test adminis- 
tration. They need to know how validly the 
assessment aligns with their standards, using 
measures of alignment in addition to cover- 
age, how validity could be improved by 
including additional measures, and the 
trade-offs among various means of setting 
cut scores. If they set requirements for 
schools to make certain amounts of progress, 
they need to know if those requirements are 
feasible, given past performance and likely 
gains. They also need to know the advan- 
tages and challenges of using value-added 
accountability models as opposed to other 
models. Certainly, accountability systems are 
deeply political, with much consideration of 
possible winners and losers coming into deci- 
sions about how to structure them. But if pol- 
icymakers want to advance their overall aim 
of improved performance, they need solid 
technical information from independent, 
credible sources — from experts — in addi- 
tion to those with a vested interest in pro- 
moting a particular assessment. 

Second, additional information about the 
education system is necessary to interpret 
accountability system performance data. At least 
three kinds of information greatly enhance 
the ability of users to make sense of and act 
on performance data produced by an 
accountability system. Enhanced information 
would also make possible a broader array of 
measures of the health of the education sys- 
tem, hopefully alleviating the intense focus 
on assessments. 
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CPR^CRESST Standards for Accountability Systems 

Standards on System Components 

• Accountability systems should employ different types of data from multiple sources. 

• The weighting of elements in the system, different test content, and different information sources should be 
made explicit. 

• Accountability systems should include data elements that allow for interpretations of student, institution, and 
administrative performance. 

• Accountability expectations should be made public and understandable for all participants in the system. 

• Accountability systems should include the performance of all students, including subgroups that historically have been 
difficult to assess. 

Testing Standards 

• Decisions about individual students should not be made on the basis of a single test. 

• Multiple test forms should be used when there are repeated administrations of an assessment. 

• The validity of measures that have been administered as part of an accountability system should be documented for the 
various purposes of the system. 

• If tests are to help improve system performance, data should be provided illustrating that the results are modifiable by 
quality instruction and student effort. 

• If test data are used as a basis of rewards or sanctions, evidence of technical quality of the measures and error rates asso- 
ciated with misclassification of individuals or institutions should be published. 

• Evidence of test validity for students with different language backgrounds should be made available publicly. 

• Evidence of test validity for children with disabilities should be made available publicly. 

• If tests are claimed to measure content and performance standards, evidence of the relationship to particular standards 
or sets of standards should be provided. 

Stakes 

• Stakes for accountability systems should apply to adults and students. 

• Incentives and sanctions should be coordinated for adults and students to support system goals. 

• Appeal procedures should be available to contest rewards and sanctions. 

• Stakes for results and their phase-in schedule should be made explicit at the outset of the implementation of the system. 

• Accountability systems should begin with broad, diffuse stakes and move to specific consequences for individuals and 
institutions as the system aligns. 

Public Reporting Formats 

• System results should be made broadly available to the media, with sufficient time for reasonable analysis and with clear 
explanations of legitimate and potential illegitimate interpretations of results. 

• Reports to districts and schools should promote appropriate interpretation and use of results by including multiple indi- 
cators of performance, error estimates, and performance by subgroup. 

Evaluation 

• Longitudinal studies should be planned, implemented, and reported, evaluating effects of the accountability program. 
Minimally, questions should determine the degree to which the system: builds capacity of staff; affects resource alloca- 
tion; supports high-quality instruction; promotes student equity access to education; minimizes corruption; affects 
teacher quality, recruitment, and retention; and produces unanticipated outcomes. 

• The validity of test-based inferences should be subject to ongoing evaluation. In particular, evaluation should address: 
aggregate gains in performance over time and impact on identifiable student and personnel groups. 
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• More data about classroom-level curricu- 
lum and instruction would help school 
users figure out why test scores are at cer- 
tain levels and decide what to do about it. 
Without information about practice, 
schools are limited in designing remedies 
for poor performance. As O'Day (in press) 
points out, professional accountability sys- 
tems that include opportunities for peer 
exchange about practice provide greater 
knowledge for action than bureaucratic 
systems that include only information 
about results. 

• More knowledge about the state of oppor- 
tunity to learn would help policymakers 
design fairer systems. Knowing the extent 
to which students are truly being taught 
the material to be assessed would help pol- 
icymakers determine realistic progress 
goals and assess the fairness of conse- 
quences. Knowing the variation in oppor- 
tunity to learn would help policymakers 
channel additional resources and assis- 
tance to needy schools. Several states wor- 
ried about high failure rates on high school 
exit exams undertook studies of opportu- 
nity to learn that were instrumental in set- 
ting timelines for the initiation of conse- 
quences (Fuhrman, Goertz, & Duffy, in 
press). 

• Evaluations of accountability systems are 
essential. Good evaluations would indi- 
cate whether students are being appropri- 
ately included in assessments, whether 
assessments have disparate impacts on 
various groups, whether classroom prac- 
tice is changing in response to assessment 
(in ways both intended and not intended), 
whether it provides remedies for poor per- 
formance work, and a host of other equal- 
ly critical questions. Evaluations will show 
whether teachers have the ability to do the 
expected job and whether that capacity is 
fairly distributed. 

Third, capacity building is essential. Deliber- 
ate interventions to improve teacher knowl- 
edge and skill, provide extra assistance to 
students at risk of failure, and to build school 
communities capable of responding to per- 
formance pressure are necessary. Further, 
states and districts need added capacity if 
they are to assist schools and intervene in 
instruction. Without investments of this type 



in capacity, improvements related to account- 
ability systems are likely to be short lived and 
superficial, and inequities are likely to 
increase. Policymakers have worked hard on 
the motivation side of the equation in devel- 
oping accountability policies; they must 
work equally hard on providing educators 
and students the wherewithal to respond to 
the new incentives. 

Finally, to enhance the use of expert tech- 
nical advice, to improve information gather- 
ing, and to invest in capacity, policymakers 
need political stamina. Accountability systems 
have become such important cornerstones of 
state policy that some policymakers are 
afraid to modify them, worried that oppo- 
nents will seize the opportunity of revision to 
undermine the whole system. Their concern 
has mounted as backlash to accountability 
policies has gained force. However, the oppo- 
sition includes not only those philosophically 
against state-directed testing and conse- 
quences for performance, but many who 
would be supporters in principle but are con- 
cerned about such issues as unequal oppor- 
tunity to learn, disparate impacts, reliance on 
single measures, and harsh consequences for 
students. Some of the latter group would be 
willing to come to the table and discuss ways 
to improve accountability systems, making it 
politically possible to modify these systems 
without risking their complete undoing. This 
was the case in several states that modified 
their high school exit exam policies over the 
past several years. Gontinued leadership and 
business support, willingness to commission 
and attend to research about the state of 
opportunity to learn, and readiness to com- 
promise on specific issues like test content 
and effective dates in order to maintain the 
basic program permitted refinements to 
occur (Fuhrman, Goertz, & Duffy, in press). 
Policymakers must take advantage of lessons 
from experience with new accountability sys- 
tems and use that knowledge to change and 
improve the systems over time. 
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